Note: not all films have all the metadata, many don’t even have a language entry (but hard to tell how much, as the maste Films table seems to have duplicates). Some films may be multilingual, other multi-entry ones are likely dubbed versions of the same film; so far no way to tell original language (besides comparing to country of origin, which is not precise). Most films list 1-2 languages.
Most have 1, some have 2-3, but there are a small number with 10+.
This and the next sections are based on the main/biggest database table Films; it has duplicate entries we do not yet fully understand, as the same movie is often listed under multiple IDs. However, entries with duration value (n=141472) contain no duplicates in terms of ID (do contain Title_Year duplicates; 85460 left when those removed).
Same as above; films with budget listed (n=29828) contain no duplicates. Units unknown (thousands, hundreds of thousands? Dollars?). The median Budget value is 1, and 9493 have it listed as 0 (the highest bar).
Admissions info is present for only a few thousand films (first week admission: 5903, total: 1234, both: 963), but majority of values are actually just 0s.
Non-zero box office totals are available for 1263 films. Units unknown.
Non-zero screening numbers are available for 932 films.
Another section in the database, on Market Participation lists more data on individual films, going back to 2005.
## 149 codes from your data successfully matched countries in the map
## 0 codes from your data failed to match with a country code in the map
## 95 codes from the map weren't represented in your data
This could be also plotted as an actual network, where each node is a festival at a given year and edges mark films shared by festivals, but it just yields a massive hairball, so here’s a matrix instead, ordered by year (hover for labels).
The Film Crews part of the database has 1068821 entries; including 965556 unique full entries, encompassing 213078 films and 286725 people (i.e. not accounting for possible namesakes; after excluding “NA” names and roles). Since that’s a bit much to put on a single plot, I’ll pick the 10 top most prolific people and plot the people in the movie crews they’ve been involved in. This yields a network of 2484 people (nodes are people, edges indicate they’ve cooperated on a movie). For people who appear in multiple roles, I’m coloring them by their most frequent role.
But to have some sort of overview, here is a big plot of all 60min+ movies with a 500+ character synopsis and non-duplicate titles (n=26725), arranged based on a simple topic model of their synopsis texts (movies with similar stories are closer; dimension-reduced to 2D using UMAP).
This has been just a quick overview, there’s plenty more info on crews, markets, movie metadata, synopsis texts etc to look into.